93 research outputs found
Quantized Compressive K-Means
The recent framework of compressive statistical learning aims at designing
tractable learning algorithms that use only a heavily compressed
representation-or sketch-of massive datasets. Compressive K-Means (CKM) is such
a method: it estimates the centroids of data clusters from pooled, non-linear,
random signatures of the learning examples. While this approach significantly
reduces computational time on very large datasets, its digital implementation
wastes acquisition resources because the learning examples are compressed only
after the sensing stage. The present work generalizes the sketching procedure
initially defined in Compressive K-Means to a large class of periodic
nonlinearities including hardware-friendly implementations that compressively
acquire entire datasets. This idea is exemplified in a Quantized Compressive
K-Means procedure, a variant of CKM that leverages 1-bit universal quantization
(i.e. retaining the least significant bit of a standard uniform quantizer) as
the periodic sketch nonlinearity. Trading for this resource-efficient signature
(standard in most acquisition schemes) has almost no impact on the clustering
performances, as illustrated by numerical experiments
Breaking the waves: asymmetric random periodic features for low-bitrate kernel machines
Many signal processing and machine learning applications are built from
evaluating a kernel on pairs of signals, e.g. to assess the similarity of an
incoming query to a database of known signals. This nonlinear evaluation can be
simplified to a linear inner product of the random Fourier features of those
signals: random projections followed by a periodic map, the complex
exponential. It is known that a simple quantization of those features
(corresponding to replacing the complex exponential by a different periodic map
that takes binary values, which is appealing for their transmission and
storage), distorts the approximated kernel, which may be undesirable in
practice. Our take-home message is that when the features of only one of the
two signals are quantized, the original kernel is recovered without distortion;
its practical interest appears in several cases where the kernel evaluations
are asymmetric by nature, such as a client-server scheme. Concretely, we
introduce the general framework of asymmetric random periodic features, where
the two signals of interest are observed through random periodic features:
random projections followed by a general periodic map, which is allowed to be
different for both signals. We derive the influence of those periodic maps on
the approximated kernel, and prove uniform probabilistic error bounds holding
for all signal pairs from an infinite low-complexity set. Interestingly, our
results allow the periodic maps to be discontinuous, thanks to a new
mathematical tool, i.e. the mean Lipschitz smoothness. We then apply this
generic framework to semi-quantized kernel machines (where only one signal has
quantized features and the other has classical random Fourier features), for
which we show theoretically that the approximated kernel remains unchanged
(with the associated error bound), and confirm the power of the approach with
numerical simulations
Asymmetric compressive learning guarantees with applications to quantized sketches
The compressive learning framework reduces the computational cost of training
on large-scale datasets. In a sketching phase, the data is first compressed to
a lightweight sketch vector, obtained by mapping the data samples through a
well-chosen feature map, and averaging those contributions. In a learning
phase, the desired model parameters are then extracted from this sketch by
solving an optimization problem, which also involves a feature map. When the
feature map is identical during the sketching and learning phases, formal
statistical guarantees (excess risk bounds) have been proven.
However, the desirable properties of the feature map are different during
sketching and learning (e.g. quantized outputs, and differentiability,
respectively). We thus study the relaxation where this map is allowed to be
different for each phase. First, we prove that the existing guarantees carry
over to this asymmetric scheme, up to a controlled error term, provided some
Limited Projected Distortion (LPD) property holds. We then instantiate this
framework to the setting of quantized sketches, by proving that the LPD indeed
holds for binary sketch contributions. Finally, we further validate the
approach with numerical simulations, including a large-scale application in
audio event classification
MM: A general method to perform various data analysis tasks from a differentially private sketch
Differential privacy is the standard privacy definition for performing
analyses over sensitive data. Yet, its privacy budget bounds the number of
tasks an analyst can perform with reasonable accuracy, which makes it
challenging to deploy in practice. This can be alleviated by private sketching,
where the dataset is compressed into a single noisy sketch vector which can be
shared with the analysts and used to perform arbitrarily many analyses.
However, the algorithms to perform specific tasks from sketches must be
developed on a case-by-case basis, which is a major impediment to their use. In
this paper, we introduce the generic moment-to-moment (MM) method to
perform a wide range of data exploration tasks from a single private sketch.
Among other things, this method can be used to estimate empirical moments of
attributes, the covariance matrix, counting queries (including histograms), and
regression models. Our method treats the sketching mechanism as a black-box
operation, and can thus be applied to a wide variety of sketches from the
literature, widening their ranges of applications without further engineering
or privacy loss, and removing some of the technical barriers to the wider
adoption of sketches for data exploration under differential privacy. We
validate our method with data exploration tasks on artificial and real-world
data, and show that it can be used to reliably estimate statistics and train
classification models from private sketches.Comment: Published at the 18th International Workshop on Security and Trust
Management (STM 2022
Diagnostic Performances of Anti-Cyclic Citrullinated Peptides Antibody and Antifilaggrin Antibody in Korean Patients with Rheumatoid Arthritis
Rheumatoid arthritis (RA) is a systemic autoimmune disease of unknown etiology. We studied the diagnostic performances of anti-cyclic citrullinated peptides antibody (anti-CCP) assay and recombinant anti-citrullinated filaggrin antibody (AFA) assay by enzyme linked immunosorbent assay (ELISA) in patients with RA in Korea. Diagnostic performances of the anti-CCP assay and AFA assay were compared with that of rheumatoid factor (RF) latex fixation test. RF, anti-CCP, and AFA assays were performed in 324 RA patients, 251 control patients, and 286 healthy subjects. The optimal cut off values of each assay were determined at the maximal point of area under the curve by receiver-operator characteristics (ROC) curve. Sensitivity (72.8%) and specificity (92.0%) of anti-CCP were better than those of AFA (70.3%, 70.5%), respectively. The diagnostic performance of RF showed a sensitivity of 80.6% and a specificity of 78.5%. Anti-CCP and AFA showed positivity in 23.8% and 17.3% of seronegative RA patients, respectively. In conclusion, we consider that anti-CCP could be very useful serological assay for the diagnosis of RA, because anti-CCP revealed higher diagnostic specificity than RF and AFA at the optimal cut off values and could be performed by easy, convenient ELISA method
Compressive Learning with Privacy Guarantees
International audienceThis work addresses the problem of learning from large collections of data with privacy guarantees. The compressive learning framework proposes to deal with the large scale of datasets by compressing them into a single vector of generalized random moments, from which the learning task is then performed. We show that a simple perturbation of this mechanism with additive noise is sufficient to satisfy differential privacy, a well established formalism for defining and quantifying the privacy of a random mechanism. We combine this with a feature subsampling mechanism, which reduces the computational cost without damaging privacy. The framework is applied to the tasks of Gaussian modeling, k-means clustering and principal component analysis (PCA), for which sharp privacy bounds are derived. Empirically, the quality (for subsequent learning) of the compressed representation produced by our mechanism is strongly related with the induced noise level, for which we give analytical expressions
- âŠ